JUBE Interoperability¶
JUBE is a benchmark automation tool developed by the Julich Supercomputing Centre.
You can find the github repo here: https://github.com/FZJ-JSC/JUBE
And the website here: https://apps.fz-juelich.de/jsc/jube/docu/index.html
JUBETemplates¶
JUBE defines the machines on which it operates via “platform.xml” files and job templates. remotemanager
provides iteroperability with these definitions via the JUBETemplate
module, which is avaiable at remotemanager.JUBEInterop
This object provides a modified from_repo
which is able to pull these files automatically.
Note
By default from_repo
is poined at the JUBE4MaX repository, however you can change this via the repo
argument. (Point it at the root of the repo).
You should then specify a path
to the directory containing the platform and template files.
The target for these files can be changed by updating platform_name
and template_name
Since the file names are likely to clash when pulling multiple machines, they are stored by default in a directory extracted from the name. You can also set this parameter with local_dir
.
[2]:
from remotemanager.JUBEInterop import JUBETemplate
template = JUBETemplate.from_repo(path="max-inputs/platforms/cineca/leonardo/booster", local_dir="temp_platform_store")
searching for platform.xml & submit.job at https://gitlab.com/max-centre/JUBE4MaX/-/raw/develop/max-inputs/platforms/cineca/leonardo/booster
Grabbed file 'temp_platform_store/platform.xml'
Grabbed file 'temp_platform_store/submit.job'
After a successful file collection, you will now be able to generate jobscripts using this computer. Lets set some basic arguments and print a sample.
[3]:
template.accountno = "test_acc"
template.nodes = 24
template.ncpus = 128
template.ncores = 128
template.taskspernode = 32
template.executable = "bigdft"
[4]:
script = template.script()
print(script)
#!/bin/bash
#SBATCH --nodes=24
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=24:00:00
#SBATCH --exclusive
#SBATCH --account=#ACCOUNT_NO#
#SBATCH --partition=boost_usr_prod
#SBATCH --qos=normal
module purge
export OMP_NUM_THREADS=1
scontrol show jobid -dd $SLURM_JOB_ID > scontrol.out
sacct -j $SLURM_JOB_ID --long > sacct.out
touch #READY#
Parameterisation¶
Since these platforms are intended to be used within the JUBE infrastructure, you will need to be careful to set the correct parameters. If you’re not sure which parameters to set, you can check the downloaded files, or print the arguments
property.
[5]:
print(template.arguments)
['jube_benchmark_name', 'queue', 'timelimit', 'starter', 'args_starter', 'measurement', 'outlogfile', 'errlogfile', 'executable', 'args_executable', 'touch $ready_file', 'nodes', 'threadspertask', 'taskspernode', 'taskspersocket', 'cpuspertask', 'pe', 'gres', 'accountno', 'qos', 'modules', 'preprocess', 'postprocess', 'wrapname', 'wrappre', 'wrappost', 'ready_file', 'make', 'cc', 'cflags', 'mpi_cc', 'mpi_cxx', 'mpi_f90', 'mpi_f77', 'load_module', 'mapping', 'submit', 'submit_script', 'shared_folder', 'shared_job_info', 'nsocket', 'nodecpus', 'ncores', 'threads', 'env', 'ngpus', 'tasks', 'memnodemachine', 'minmem']
Missing Parameters¶
By default, when encountering a missing argument for a substitution, BaseComputer
will delete the whole line. This is based on the rationale that a jobscript #PRAGMA flag=#argument#
is best deleted if empty, as it will cause the job to fail.
JUBETemplate
uses the “local” empty behaviour for substitutions by default. This means that any missing parameters are removed “locally”, not globally.
If you look at the earlier example, and compare it with the template, you can see many examples of this. Especially on the mpirun
call, arguments are missing, however the line is still present.
Temporary Values¶
Just like BaseComputer is able to accept “temporary” values within the script()
method, so is JUBETemplate
[6]:
print(template.script(nodes=128))
#!/bin/bash
#SBATCH --nodes=128
#SBATCH --ntasks-per-node=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:4
#SBATCH --time=24:00:00
#SBATCH --exclusive
#SBATCH --account=#ACCOUNT_NO#
#SBATCH --partition=boost_usr_prod
#SBATCH --qos=boost_qos_bprod
module purge
export OMP_NUM_THREADS=1
scontrol show jobid -dd $SLURM_JOB_ID > scontrol.out
sacct -j $SLURM_JOB_ID --long > sacct.out
touch #READY#
Just like the BaseComputer
temporary values, these exist for only a single run.
[7]:
nodes = template.script().split("\n")[1]
print(nodes)
#SBATCH --nodes=24